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ABSTRACT 

We recently identified and cloned novel breast cancer-specific gene 
BCSG1 by direct differential cDNA sequencing. BCSG1 has a great se- 
quence homology with the Alzheimer's disease-related neural protein 
synuclein (SNC); thus, it was also named SNC-y. Overexpression of 
SNC-y in breast cancer cells leads to a significant increase in motility and 
invasiveness in vitro and a profound augmentation of metastasis in vivo. 
Our data suggest that this member of the neural protein SNCs might have 
important functions outside the central nervous system and may play a 
role in breast cancer progression. 



INTRODUCTION 

If sufficiently characterized, the identification of quantitative 
changes in gene expression that occur in the malignant mammary 
gland may yield novel molecular markers that may be useful in 
understanding breast cancer development and progression (1). Within 
this context, we have previously reported the isolation of differentially 
expressed genes in cDNA libraries from normal breast tissue and 
infiltrating breast cancer using the expressed sequence tag-based 
differential cDNA sequencing approach (2, 3). Of the many putative 
differentially expressed genes (2, 3), BCSG1, which was identified as 
a group of expressed sequence tags specifically expressed in the 
mammary gland relative to other organs and abundantly expressed in 
a breast cancer cDNA library but scarcely seen in a normal breast 
cDNA library, was identified as a putative breast cancer-specific gene 
(2). 

Interestingly, BCSG1 revealed no homology to any other known 
growth factors or oncogenes; however, BCSG1 revealed extensive 
sequence homology to the AD 3 -related neural proteins called SNCs 
that are expressed mainly in the brain and localized to presynaptic 
terminals (4-7). The pathological hallmark of AD is amyloid depo- 
sition in neurotic plaques and blood vessels (8). Two major intrinsic 
constituents of amyloid are a 39-43-amino acid peptide named the 
A/3 component (8) and the recently identified non-A/3 component (4). 
The non-A/3 component of the AD precursor was cloned from a 
human brain library (4) and named SNCA because it shares a 95% 
sequence homology with rat SNC. Recently, a second SNC named 
SNCB was cloned from human brain, and it has a 61% sequence 
identity with SNCA (6). The previously identified BCSG1, which is 
also highly expressed in the brain (2), has a 54 and 56% sequence 
identity with SNCA and SNCB, respectively, and has been renamed 
SNCG (9). Thus, the previously unrecognized homology between 
these proteins defines a family of human brain SNCs that currently 
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has three members. Although SNCs are abundant proteins expressed 
in presynaptic terminals and are strongly associated with amyloid 
plaque in AD and Lewy body in PD (10), their functions have not yet 
been defined. SNCA aggregation may be important in the etiology and 
pathogenesis of neurodegenerative disorders such as AD and PD (10). 
During its identification as a breast cancer-specific gene, we previ- 
ously demonstrated stage-specific SNCG expression as follows: (a) 
SNCG was undetectable in normal or benign breast lesions; if?) SNCG 
showed partial expression in ductal carcinoma in situ; and (c) SNCG 
was expressed at an extremely high level in advanced infiltrating 
breast cancer. The effects of SNCG on breast cancer growth and 
metastasis were investigated in the current studies. 

MATERIALS AND METHODS 

Transfection. Full-length SNCG cDNA was inserted into a pCI-neo mam- 
malian expression vector, and the resulting vector was transfected into MDA- 
MB-435 cells as described previously (3, 11). 

Preparation of CM. All of the clones were maintained in subconfluent 
monolayers with 10% FCS. The medium was discarded, and the monolayers 
were washed twice with PBS. The monolayers were cultured in the absence of 
serum in DMEM supplemented with transferrin (1 mg/liter), fibronectin (1 
mg/liter), and trace elements (Biofluids, Rockville, MD). After 24 h, the 
serum-free medium was discarded, and the cells were replenished with the 
fresh serum-free medium. The CM were collected 30 h later. Media were then 
centrifuged at 1 ,200 X g, and the supernatants were saved and concentrated 
approximately 5-fold using an Amicon hollow fiber concentrator with a Af r 
10,000 cutoff at 4°C The protein concentrations of CM were determined and 
normalized. 

MMP Activity. The MMP enzymatic activity of the CM was assayed using 
a quenched fluorescent substrate Mca-Pro-Leu-Gly-Leu-Dpa-Ala-Arg-NH 2 
(Bachem) as described previously (12). The CM were pretreated with APMA 
for activation (13). 

In Vitro Invasion and Motility Assay. As described previously (11), cell 
invasion and motility were analyzed in a modified Boyden chamber assay 
using 8-/im polycarbonate membranes coated with 4 mg/ml growth factor- 
reduced Matrigel. 

Tumor Growth in A thymic Nude Mice. A tumorigenic assay was per- 
formed in nude mice as described previously (3, 11). Briefly, approximately 
0.4 X 10 6 cells (0.15 ml) were injected into a 5-6-week old female athymic 
nude mouse (Frederick Cancer Research and Development Center, Frederick, 
MD). Each animal received two injections, one on each side, in the mammary 
fat pads between the first and second nipples. Tumor size was determined at 
weekly intervals by three-dimensional measurements (in* millimeters) using a 
caliper. Only measurable tumors were used to calculate the mean tumor 
volume for each tumor cell clone at each time point. Animals were sacrificed 
32-40 days after injection, when the largest tumors reached about 15 mm in 
diameter. 

Assessment of Regional Lymph Node and Lung Metastasis. As de- 
scribed previously (11), the axillary lymph nodes and lungs of sacrificed 
animals were excised, weighed, fixed in formalin, embedded in paraffin, and 
stained with H&E for a microscopic examination for morphological evidence 
of tumor metastasis. Sections were reviewed and scored by two pathologists. 

Antibody Production. The purified synthetic SNCG peptide correspond- 
ing to amino acids 101-117 (2) was conjugated and injected into New Zealand 
rabbits as reported previously (12). The antiserum was , purified using SNCG 
peptide affinity chromatography. \ 
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Fig. I. Transfection of SNCG to MDA-MB-435 cells. A, Northern blot. Each lane 
contained 30 /ig of total RNA. B, Western blot with an affinity-purified specific SNCG 
peptide polyclonal antibody. Each lane contained 20 tig of protein. Lane l t neo-435-1; 
Lane 2, SNCG-435-2; Lane 3, SNCG-435-1; Lane 4, SNCG-435-3; Lane 5, neo-435-2. 

RESULTS AND DISCUSSION 

Transfection of SNCG into MDA-MB-435 Human Breast Can- 
cer Cells. To determine the effects of SNCG on invasion/metastasis, 
we selected MDA-MB-435 human breast cancer cells as recipients for 
SNCG-mediated gene transfection due to their lack of detectable 
SNCG transcript (2) and their highly tumorigenic and aggressive 
phenotype in nude mice (11). Cells were transfected with a plasmid 
vector containing a neomycin resistance gene (neo clones) or with the 
same vector containing full-length SNCG cDNA (SNCG clones). 
MDA-MB-435 clones expressing SNCG were designated as SNCG- 
435 clones, and the control neo- transfected cells were designated as 
neo-435 clones. Fig. 1 shows the Northern blot and Western blot 
analyses of SNCG expression in selected clones. All selected SNCG- 
435 clones expressed SNCG mRNA transcripts and proteins. In con- 
trast, none of the neo-435 clones produced any detectable SNCG 
transcripts and proteins. No changes in morphology were observed in 
these clones. Based on the level of SNCG expression, we selected 
SNCG-435-1, SNCG-435-3, neo-435-1, and neo-435-2 clones for the 
subsequent studies. 

In Vitro Growth of SNCG-transfected Cells. To determine 
whether SNCG overexpression affects the growth of MDA-MB-435 
cells, cells from exponentially growing cultures of different MDA- 
MB-435 clones were seeded in triplicate at 3000 cells/well (24-well 
plate) in 1 ml of DMEM-5% serum. The growth rates of SNCG- 
positive SNCG-435-1 and SNCG-435-3 cells were compared with 
those of SNCG-negative neo-435-1 and neo-435-2 cells in a mono- 
layer culture. No significant differences in growth rate were observed 
among SNCG-positive and SNCG-negative cells (data not shown). 

Metastasis in the Orthotopic Nude Mice Model. Because SNCG 
was highly expressed in the infiltrating breast cancer cells relative to 
benign or noninvasive in situ carcinomas (2), we were interested in 



studying whether SNCG is an instigator of metastasis or merely a 
correlative product during breast cancer progression. The effect of 
SNCG expression on metastasis was assayed in an in vivo orthotopic 
(mammary fat pad) nude mouse model. Two independent experiments 
were done to confirm reproducibility, and the data from these exper- 
iments are summarized in Table 1. After a lag phase of 10 days, mice 
given implants of both SNCG-positive and SNCG-negative cells 
developed tumors. There was no difference in tumor incidence be- 
tween neo-435 and SNCG-435 clones. Starting at about 20 days after 
inoculation, tumor necrosis was observed in tumors derived from 
SNCG-435-1 and SNCG-435-3 cells. Neo-435-1 and neo-435-2 cells 
also developed some tumor necrosis, but with less intensity. Consist- 
ent with the similar in vitro growth rates, there was no significant 
difference in primary tumor size between the neo-435 and SNCG-435 
clones at 40 days after injection. 

To study tumor dissemination, axillary lymph nodes and lungs were 
examined physically at autopsy and then subjected to microscopic 
examination for morphological evidence of tumor cells by light mi- 
croscopy on H&E-stained paraffin sections. For the axillary lymph 
node, the average weight was 15 mg for neo-435 mice and 44 mg for 
SNCG-435 mice. The increased lymph node weight reflects the in- 
vaded breast tumors. Representative H&E-stained sections for neo- 
435 and SNCG-435 lymph nodes are shown in Fig. 2, A and B. 
Microscopic examination indicated that SNCG-435-1 and SNCG- 
435-3 mice showed a significantly higher average lymph node posi- 
tivity (64 and 77% ) compared to that (27% ) of SNCG-negative 
neo-435-1 and neo-435-2 mice (Table 1). For lung metastases, the 
numbers of visible nodules on the surface of the lungs increased 
dramatically from an average of 1 for neo-435 mice to an average of 
23 for SNCG-435 mice (Table 1). The representative lungs were 
shown in Fig. 2C When these lungs were examined microscopically, 
large numbers of micrometastases were observed in SNCG-435 mice; 
the lungs of neo-435 mice had significantly fewer micrometastases 
(data not shown). Representative H&E-stained sections for neo-435 
and SNCG-435 lungs are shown in Fig. 2, D-G. To our knowledge, 
human breast cancer cells usually do not form such a profound 
regional and metastatic tumor dissemination (visible lung nodules) in 
the spontaneous mammary fat pad nude mouse model. This dramatic 
SNCG-stimulated metastasis suggests a role for SNCG as a key 
positive regulator for breast cancer invasion and metastasis. The 
overexpression of SNCG in malignant infiltrating breast epithelial 
cells compared to the low expression level in noninvasive in situ 
carcinoma (2) suggests that SNCG expression is a meaningful marker 
for breast cancer malignant progression and may signal the more 



Table 1 Effects of SNCG on tumor incidence, tumor size, and axillary lymph node and lung metastasis 
Cells (400,000) were injected at day 1 into the mammary fat pads, and tumor volumes and lymph node and lung micrometastases were determined. Lymph node metastases were 
measured by microscopic examination for morphological evidence of tumor cells on the fixed axillary lymph nodes. Lung metastases were measured by the presence of visible tumor 
nodules on the surface of the lung. Volumes are expressed as the means ± SEs (number of tumors assayed). Experiment 1 had a total of 16 injections for eight mice in each group 
and the mice were killed 42 days after injection. For experiment 2, there was a total of 10 injections for five mice in each group, and the mice were killed 38 days after injection 
Statistical comparisons for SNCG-posmve clones and SNCG-negative clones showed that there was no significant difference in the mean tumor sizes between pooled SNCG-positive 
and pooled SNCG-negative tumors. The lymph node positivity of pooled SNCG-435-1 tumors versus combined pooled SNCG-negative neo435-l and neo-435-2 tumors was P <0 039 
and P < 0.029 for pooled SNCG-435-3 tumors versus SNCG-negative tumors. Statistical comparison of primary tumors was analyzed by Student's t test. A V 1 test was used for a 
statistical analysis of lymph node metastasis. . 
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Lymph node 
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primary tumor 


Tumor total (%) 


Average weight 
(mg) 


No. positive/total no. 


1 
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necv435-l 

ne<Mt35-2 

SNCG-435-1 

SNCG-435-3 

neo-435-1 

neo-435-2 

SNCG-435-1 

SNCG-435-3 


1.74 ±0.52 
1.9 ± 0.31 
1.45 ±0.37 
1.78 ± 0.31 
1.35 ± 0.39 
1.69 ± 0.44 
1.73 ±0.45 
1.49 ± 0.34 


16716(100) 
14/16 (88) 
15/16 (94) 
16716(100) 
9/10 (90) 
10/10(100) 
10/10(100) 
10/10(100) 


14 
18 
43 
50 
12 
15 
45 
39 


3/16(19) 
4/15(27) 
10/15(67) 
12/16(75) 
3/10(30) 
3/9 (33) 
6710(60) 
7/9 (78) 



Lung metastasis 



No. of nodules 



0 
2 
19 
31 
1 
1 

24 
17 
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Fig. 2. Axillary lymph nodes and lung metastasis from neo-435 mice and SNCG-435 mice. The mice were sacrificed at day 40 after cell injection. Lymph nodes and lungs were 
isolated, and some were subjected to H&E staining. Representative axillary lymph nodes from a neo-435- 1 mouse (A) and a SNCG-435-3 mouse (B) are shown. Arrow, an invasive 
breast tumor that mainly occupied the lymph node in a SNCG-435-3 mouse. A and B, x 10. C, representative lung metastases from mice injected with SNCG-positive and 
SNCG-negative cells. The left lung was from a neo-435- 1 mouse, and the right lung was from a SNCG-435-3 mouse. The metastatic tumors only reflect the nodules on the surface 
of the lungs (X2.5). D-G, microscopic examination of representative lung metastases in H&E-stained sections. D, a lung without metastases from a neo-435- 1 mouse. E, a lung with 
micrometastases from a neo-435-2 mouse, F, a lung with a small breast tumor nodule from a SNCG-435- 1 mouse. C, a lung with a large breast tumor nodule from a SNCG-435-3 
mouse. Arrows, breast tumors or cancer cells. D-G, X20. 



744 



SNCG IN HUMAN BREAST CANCER METASTASIS 



120 



100 - 



CO 

o 
o 

i 

0. 



80 



60 - 




✓ / ^ 

* + J jf J 



Fig. 3. Analysis of the MMP activities of SNCG-positive and SNCG-negative cells. 
The pooled CM from SNCG-negative neo-435-I and neo-435-2 cells and SNCG-positive 
SNCG-435-I and SNCG-435-3 cells were collected, concentrated 5-fold t normalized for 
protein concentrations, and subjected to MMP activity analysis. Recombinant AMPA- 
activated MMP-2 (80 ng) was used as a positive control. All values were normalized to 
the enzymatic activity of the recombinant MMP-2, which was taken as 100%. The 
numbers represent the means ± SD of three tests. 



advanced invasive/metastatic phenotype of human breast cancer. In 
this regard, the up-regulation of SNCG expression may facilitate 
breast cancer progression leading to metastasis. 

MMP Activity. In an effort to investigate the molecular mecha- 
nisms underlying SNCG-induced metastasis, we studied several inva- 
sion-related factors, including MMP and cell motility. The amyloid 
protein has recently been demonstrated to be a strong stimulator of 
MMP-2 and MMP-9 expression in astrocytes (14). It is well estab- 
lished that the overproduction and unrestrained activity of MMPs, 
particularly MMP-2 and MMP-9, are linked to the malignant conver- 
sion of a variety of different tumor cells (15-22) including mammary 
tumors (18-22). It is interesting to test whether SNCG, an amyloid- 
related protein, stimulates MMP-2 and MMP-9 expression in breast 
cancer cells and leads to the more metastatic phenotype. We investi- 
gated whether SNCG overexpression would increase MMP activity in 
MDA-MB-435 cells. In this regard, the pooled CM from two SNCG- 
negative cells and the pooled CM from two SNCG-positive cells were 
concentrated and subjected to a MMP enzymatic assay. As shown in 
Fig. 3, no significant differences in the basal levels of proteolytic 
activities were observed between neo-435 and SNCG-435 clones. 
Mammalian MMPs are usually secreted as latent proenzymes (zymo- 
gen) and require activation for their enzymatic activity. The incuba- 
tion of CM with the MMP activator organomercurial compound 
APMA resulted in an approximately 2-fold increase in proteolytic 
activity for the CM from both neo-435 and SNCG-435 clones. How- 
ever, no significant difference in APMA-activated MMP activities 
was observed between neo-435 and SNCG-435 clones. Because the 
measured enzymatic activity represents the net MMP activity, reflect- 
ing the balance between activated MMPs and the tissue inhibitors of 
metalloproteinase, our data suggest that SNCG-induced metastasis 
may not be mediated by the regulation of MMP and tissue inhibitors 
of metalloproteinase. 



Stimulation of Invasiveness and Motility of MDA-MB-435 Cells 
by SNCG. We used an wi vitro reconstituted basement membrane 
(Matrigel) invasion assay to determine the effect of SNCG on cell 
invasion. All three SNCG-negative cells (parental MDA-MB-435, 
neo-435- 1, and neo-435-2) were moderately invasive. At the end of a 
48-h incubation, an average of approximately 250 SNCG-negative 
cells had crossed the Matrigel barrier. A significant stimulation of 
invasiveness was noted in two SNCG-positive clones, with a 3-fold 
increase for SNCG-435- 1 cells and a 4.3-fold increase for SNCG- 
435-3 cells (Fig. 44). We also investigated the effect of SNCG on cell 
migration without Matrigel. A similar SNCG-stimulated pattern of 
migration was observed. At the end of a 24-h incubation, SNCG- 
435- 1 cells migrated 4-fold, and SNCG-435-3 cells migrated 4.2-fold 
over that of average SNCG-negative cells (Fig. 4B). The similar 
magnitude of the invasion-stimulating and migration-stimulating ac- 
tivity of SNCG suggests that the increased invasion in SNCG clones 
may be mediated by an alteration of cell motility. To determine 
whether the increased cell motility is mediated by chemotaxis due to 
the different concentrations of serum or chemoattractants in the top 
and bottom chambers, we compared the migration of SNCG-435-3 
and neo-435- 1 cells under three different culture conditions: (a) 
serum-free conditions; (b) serum with gradient; and (c) serum without 
gradient. As shown in Fig. 5, although the migration was relatively 
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Fig. 4. Stimulation of invasiveness.and migration of MDA-MB-435 cells by SNCG. 
Cells were seeded at a density of 30,000 cells/ml/well on 8-/im polycarbonate membranes 
coated with (A) or without (B) 4 mg/ml growth factor-reduced Matrigel. The top chamber 
contained 5% FCS, and the bottom chamber contained 10% FCS. A, after incubation in a 
humidified incubator with 5% C0 2 at 37°C for 48 h, the medium and cells were removed 
from the bottom chambers and counted using a microscope. B, cells were cultured under 
the same conditions as described in A, The number of cells that migrated was counted after 
a 24-h incubation. All values were expressed as the number of invaded cells. The numbers 
represent the means ± SD of three cultures. 



745 



SNCG IN HUMAN BREAST CANCER METASTASIS 



2000 

1800 

1600 - 

g 1400 
3 

c 1200 H 



0> 

o 

c 
o 

E 

O) 



1000 
800 H 
600 
400 - 
200 - 
0 



n 



^ J? 



Fig. 5. Comparison of the cell migration of SNCG-435-3 and neo-435-1 cells under 
different conditions. Cells were cultured on noncoated membrane at a density of 30,000 
cells/ml/well. The cells that migrated were harvested at 32 h after incubation. A, 0% serum 
in both the top and bottom chambers. B, 2% serum in both the top and bottom chambers. 
C, 2% serum in the top chamber and 10% serum in the bottom chamber. All values were 
expressed as the number of invaded cells. The numbers represent the means ± SD of 
triplet wells. 



low under serum-free conditions, there was a 2.8-fold increase in 
migration in SNCG-435-3 cells compared with that in neo-435-1 cells. 
When 2% serum was added in the top chamber, the migration of both 
SNCG-positive and -negative cells increased significantly. However, 
the migration of SNCG-435-3 cells was not affected by the serum 
gradient Approximately 1600 of the SNCG-435-3 cells that migrated 
into the bottom chamber contained either 2% serum or 10% serum. 
These data suggest that the increased migration in SNCG-positive 
cells is not likely to be mediated by chemotaxis but rather by high 
motility features intrinsic to the cells. 

Many breast tumors go through a series of events from the time of 
initial detection to the formation of the lethal invasive and metastatic 
stage. According to the three-step hypothesis of invasion (23), cell 
adhesion, local proteolysis, and subsequent migration or motility are 
key steps in the traversal of the basement membrane and connective 
tissue. In this study, we provide evidence linking the overexpression 
of neural protein SNCG, a previously identified breast cancer-specific 
gene (2), in human breast cancer cells with increased motility and 
invasive activity in vitro and a profound augmentation of metastasis in 
vivo. 

SNC proteins have a structural resemblance to apolipoproteins but 
are abundant in the neuronal cytosol and are present in enriched 
amounts at presynaptic terminals (9). SNCs have been specifically 
implicated in two diseases: AD and PD. In AD patients, a peptide 
derived from SNCA forms an intrinsic component of plaque amyloid 
(9). In PD patients, a SNCA allele is genetically linked to several 
independent familial cases, and the protein appears to accumulate in 
Lewy bodies (9). The general significance of the involvement of 
neural protein SNCG in cancer metastasis is unknown. Recently, 
SNCA and SNCB were identified as two abundant proteins through 
their reactivity with a monoclonal antibody recognizing MAP-t (6),on 
immunoblots. In eukaryotic cells, microtubules, actin, and intermedi- 



ate filaments interact to form the cytoskeletal network involved in the 
determination of cell architecture, mitosis, differentiation, and motil- 
ity (24). Cytoskeletal organization and dynamics depend on protein 
self-associations and interactions with regulatory elements such as 
MAPs. There is increasing evidence that MAPs, including MAP-t, 
play a critical role in inducing microtubule assembly and controlling 
the dynamic instability of microtubules, thus controlling the state of 
their assembly and organization in cells (reviewed in Ref. 24). SNCG 
may interact with MAPs and regulate the cytoskeletal organization 
and dynamics, leading to increased motility. Nevertheless, our data 
indicate that the increased expression of SNCG correlates with breast 
cancer progression (2) and leads to a more malignant metastatic 
phenotype. We also demonstrated that SNCG expression in breast 
cancer cells was subjected to cytokine regulation and dramatically 
suppressed by the tumor growth inhibitor OM, and that this OM- 
induced transcriptional suppression of the SNCG gene was associated 
with OM-mediated growth inhibition. 4 OM is an antitumor cytokine 
produced mainly by activated T cells and macrophages (25), and its 
growth-suppressing activity has been well studied in breast cancer 
cells (26-28). One characteristic of the host response to tumor pro- 
gression is the infiltration of tumors by macrophages and T lympho- 
cytes. The production of tumor-suppressing cytokines in a timely and 
locally (in situ) released fashion may represent an important function 
of the host defense system in suppressing tumor progression. From 
this prospective view, the dramatic suppression of SNCG expression 
in malignant breast cells by OM may represent the host-mediated 
tumor suppression leading to the inhibition of breast cancer progres- 
sion. 

This is the first report indicating the potential involvement of SNC 
in a non-neural disease. An elucidation of the reasons for SNCG 
overexpression in infiltrating breast cancer and SNCG-induced me- 
tastasis may shed some light on the pathogenesis of breast cancer 
progression as well as neurodegenerative disorders. 
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ABSTRACT 

A high-throughput direct-differentia) cDNA sequencing approach was 
employed to identify genes differentially expressed In normal breast as 
compared with breast cancer. Approximately 6000 expressed sequence 
tags (ESTs) from cDNA libraries of normal breast and breast carcinoma 
were selected randomly and subjected to EST-sequencing analysis. The 
relative expression levels of more than 2000 unique EST groups were 
quantitatively compared In normal vmus cancerous breast. Of many 
putative differentially expressed genes, a breast cancer-specific gene, 
BCSG1, which was expressed In high abundance In o breast cancer cDNA 
library but scarcely In a normal breast cDNA library, was Identified as a 
putative breast cancer marker, in situ hybridization analysis demon- 
strated stage-specific BCSGI expression as follows: BCSG1 was undetect- 
able In normal or benign breast lesions, showed partial expression In 
ductal carcinoma in situ, but was expressed at an extremely high level In 
advanced Infiltrating breast cancer. The predicted amino add sequence of 
BCSOi gene has a significant sequence homology to the non-amyloid 0 
protein fragment of the Alzheimer's disease amyloid protein. BCSGI 
overexpression may Indicate breast cancer malignant progression from 
benign breast or in situ cardnoma to the highly Infiltrating carcinoma. 



INTRODUCTION 

The onset and progression of breast cancer is accompanied by 
multiple genetic changes that result in qualitative and quantitative 
alterations in individual gene expression ( I ). Our hypothesis is that 
many of these quantitative genetic changes manifest themselves as 
alterations in the cellular complement of novel transcribed mRNAs. 
Identification of these mRNAs, if sufficiently characterized, could 
provide clinically useful information for patient management and 
prognosis while enhancing our understanding of breast cancer patho- 
genesis. Although pathological end points such as tumor size, lymph 
node status, and status of estrogen receptor and progesterone receptor 
remain the most useful guides in prognosis and in selecting treatment 
strategies for breast cancer (2), there is a need to further investigate 
the molecular mechanisms that determine the properties of an indi- 
vidual tumor, e.g., probability of metastasis. Although numerous 
prognostic factors have now been identified, few have contributed to 
defining the clinical response to therapy. 

Identification of quantitative changes in gene expression that occur 
in the malignant mammary gland, if sufficiently characterized, may 
yield novel molecular markers that may be useful in the diagnosis and 
treatment of human breast cancer. Several differential cloning meth- 
ods, such as differential display PCR and subtractive hybridization, 
have been used to identify the genes differentially expressed in breast 
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cancer biopsies, as compared to normal breast tissue controls (3-7). 
However, these investigations have involved the relatively time- and 
labor-intensive steps of subcloning, library screening, and cDNA 
sequencing of individual genes (4, 8). On the other hand, creation of 
libraries is a rapid method used to identify or "tag" sequences that are 
expressed in specific tissues (9, 10). Since the introduction of the 
EST 3 sequencing approach, many novel human genes have been 
discovered (9, 10). The advantage of this methodology, compared to 
isolation and sequencing of individual cDNAs, is that a large number 
of sequences can !*c "catalogued*' with small amounts of sequencing 
data. 

With the availability of tens of thousands of ESTs, researchers now 
shift their attention to the unveiling of the expression profile of 03 
individual genes or patterns of genes in normal versus diseased states. HI 
Several newly developed strategies, such as the serial analysis of gene CP 
expression (II) and cDNA microarray (12) methods, have demon- ~* 
strated potential for broad application for quantitative analysis of T> 
differential patterns of gene expression. Within this context, we un- ^ 
dertook a search, using the differential cDNA sequencing approach, 
for isolation of differentially expressed ESTs and the possible pres- p- 
ence of the new marker genes for breast cancer. In this initial report, 3> 
we describe a novel BCSO named BCSGI that is overexpressed in CP 
advanced infiltrating breast cancer cells but not in normal or benign 
breast lesion. The expression pattern of BCSGI may be « meaningful 
marker in the development of breast cancer. 



MATERIALS AND METHODS 

Reagents. Restriction enzymes, T7 polymerase, random primer DNA la- 
beling kit, and digoxigenin-labeled nucleotides were obtained from Boehringer 
Mannheim (Indianapolis, IN). [ 32 PldATP was purchased from Amersham 
Corp. 

Differential cDNA Sequencing. We have used EST analysis to search for 
new genes differentially expressed in breast cancer versus normal breast tissue. 
A data base containing approximately 500,000 human partial cDNA sequences 
(ESTs) has been established in a collaborative effort between the Institute for 
Genomic Research and Human Genome Sciences, Inc., using high-throughput 
automated DNA sequence analysis of landomly selected human cDNA clones 
(10). RNAs from a stage III breast carcinoma and patient-matched normal 
breast were isolated and subjected to preparation of cDNA libraries. EST- 
automated DNA sequence analysis was performed on rancr^ly selected 
cDNA clones. Both libraries had about 60% novel gene sequences, which did 
not match exactly to published human genes. A total of 3048 ESTs from breast 
cancer cDNA library and 2886 ESTs from the normal breast cDNA library 
were randomly picked and sequence analyzH. The ESTs with overlapping 
sequences were grouped into unique EST groups, with each EST group 
representing a gene or a family of sequence-related genes. Each unique EST 
group without overlapping sequences was analyzed for its relative expression 
by examining the number of expressed individual ESTs in the libraries of 
normal versus diseased tissues. There were more than 2200 EST groups that 
were analyzed for quantitative comparison of EST "hits" in the pair of cDNA 
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x The abbreviations used are: EST. expressed sequence tag: BCSG. breast cancer- 
specific gene: A0, amyloid 0 protein; AD. Alzheimer's disease: DCIS. ductal carcinoma 
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Table I Partial list of differentially expressed genes in normal versus cancerous 
breasts identified by differential cONA sequencing 

Complementary DNA libraries were established from a stage HI breast carcinoma and 
patient-matched normal breast A total of 5934 ESTs were randomly picked and sequence 
analyzed. More than 2200 distinctive EST groups were analyzed for quantitative com* 
pari son of EST hits in the pair of cDNA libraries from breast cancer versus normal breast 
as described in "Materials and Methods." The same EST groups were also analyzed by 
examining the tissue-specific expression against the total of 500,000 ESTs from a variety 
of different cDNA libraries. Only a unique EST group with more than three breast-specific 
EST hits was listed, and the rest of the several dozen EST groups with fewer than four 
breast-specific EST hits were omitted in this list. 
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libraries from normal breast versus breast cancer by exumining the expression 
of individual EST sequences. The number of EST hits in the libraries reflects 
the relative expression or mRNA transcript copy numbers of the EST. This 
direct differential cDNA sequence, utilizing the direct EST sequencing anal- 
ysis simultaneously on a pair of cDNA libraries made from normal breast and 
breast cancer tissue, was used to study the expression profile of individual 
genes and patterns of genes in normal breast versus breast cancer tissue. 

Tissue-specific Expression Analysis. Analysis of relative expression of 
breast-derived ESTs versus their expression in other tissues was performed. 
The differentially expressed EST groups Identified by differential cDNA 
sequence were analyzed for tissue-specific expression against the total of 
500.000 ESTs from a variety of different cDNA libraries. 

Northern Analysis. Total RNA was extracted from tissues according to the 
method of Chomczynski and Sacchi ( 13). The RNA from human breast cancer 
cells was prepared using the RNA isolation kit RNAzol B (Tel-Test, Inc.) 
based on the manufacturer's instruction. Equal aliquots of RNA were electro- 
phoresed in a 1.2% agarose gel containing formaldehyde and transferred to 
nylon membrane (Boehringer Mannheim). The membrane was prehybridized 
with ExpressHyb hybridization solution (Clontech, Inc.) at 68°C for 30 min. 
The hybridization was carried out in the same solution with 32 P-Iabeled 
BCSGI probe (1.5 X If/ cpm/ml) for 1 h at 68°C. The membrane was then 
rinsed in 2X SSC containing 0.05% SDS three times for 30 min at room 
temperature, followed by two washes with 0. 1 x SSC containing 0. 1 % SDS for 
40 min at 5(TC. The full-length BCSGI cDNA was isolated from the Blue- 
script vector following EcwRI and Xhol digestion and was used as a template 
for preparation of a random-labeled cDNA probe. 

in Situ Hybridhalioo. in situ hybridization was carried out as described 
(14). Briefly, deparaffmized and acid-treated sections (5 pm thick) were 
treated with proteinase K, prehybridized, and hybridized overnight with 
digoxtgenin-labeled antisense transcripts from a BCSGI cDNA insert The 
BCSGI antisense probe is a 550-bp full-length fragment. The probe was 
generated by a Pstl cut of BCSGI cDNA plasmid and followed by T7 
polymerase. Hybridization was followed by RNase treatment and three strin- 



gent washings. Sections were incubated with mouse antidigoxigenin antibodies 
(Boehringer Mannheim) followed by the incubation with biotin-conjugated 
secondary rabbit antimouse antibodies (DAKO). The colorimetric detections 
were performed with a standard indirect streptavidin-biotin immunoreuction 
method using the Universal LSAB Kit (DAKO) according to the manu- 
facturer's instructions. 



RESULTS 

Molecular Cloning of BCSGI cDNA. We generated cDNA li- 
braries from breast cancer biopsy specimens and patient-matched 
normal breast and analyzed these libraries by EST sequencing. Ap- 
proximately 6000 ESTs were analyzed and assigned to different 
groups based on sequence overlapping, and 2200 unique EST groups 
were first analyzed for relative expression in the cDNA libraries from 
normal breast versus breast cancer tissue and then subjected to tissue- 
specific expression by examining the tissue origins of individual EST 
sequences against a large population of ESTs derived from a variety 
of different tissue types. As a result, we identified three classes of EST 
groups that were differentially expressed in normal breast versus 
breast cancer tissue. As a demonstration of this approach, Table 1 
shows a partial list of three classes of genes that are differentially 
expressed in normal breast versus breast cancer tissue. Class I repre- 
sents the genes more abundant in breast cancer than in normal breast 
and includes cathepsin D, a well-studied steroid regulated extracellu- 
lar matrix-degrading proteinase (15-17). Cathepsin D is thought to 
play a role in breast cancer metastasis (15-17) and has been proposed 
as a prognostic marker in breast cancer progression (18-21). As 
listed, there were five cathepsin D ESTs sequenced in the breast 
cancer cDNA library and only one EST in the normal breast cDNA 
library. Another proposed breast cancer metastasis-related gene and a 
prognostic marker for breast cancer, M r 67,000 laminin receptor 
(22-26), was also picked up in this class by the differential cDNA 
sequencing approach. Class II represents genes that are more abundant 
in normal breast than in breast cancer. 
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Fig. I . Comparison of the predicted amino acid sequence with the sequence of non-A0 
component of AD amyloid protein using S'usProt. After optimal alignment using the 
clustal method of the MegAJign Program from the DNASTAR software package, the 
putative protein shows a 54% sequence identity with (he non-A0 fragment of human AD 
amyloid protein. Conserved amino acids are underlined. 
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Fig. 2. The expression of BCSOl gene in o variety of normal human adult (Issues. 
Twenty jig of totul RNA from each of the above Dhkuch were anulyzed in Northern hloi 
using a random primer probe. A ntrong hybridizing band of about I kb was recognized in 
the lane corresponding to RNA from adult brain. A weak t-kh transcript was also deiecud 
in testis, heart, fiplccn. colon, and ovary. 



Although the genes in classes I and II are differentially expressed in 
normal breast versus breast cancer tissue, these genes are not unique 
to breast tissues. Class HI is a special group of genes that are 
selectively expressed in breast relative to other tissue types. The 
tissue-specific expression of the unique gene was searched against 
approximately 500,000 ESTs using the BLAST program (27). None 
of these BCSGs except the first one matched with any sequences in 
public gene sequence databases. The automated screening revealed a 
group of eight ESTs encoding a novel BCSGI gene from the partial 
cDNA database containing approximately 500,000 ESTs. Of the eight 
distinctive EST clones in BCSGI, seven of them were discovered in 
breast cDNA libraries and only one in a brain library. Of the seven 
EST clones discovered in the breast cDNA libraries, six of them were 
identified in the breast tumor library and only one in the normal breast 
library. BCSGI was chosen for analysis as a first putative breast 
cancer marker gene because (a) its sequence has been matched with 
the sequence in the public gene sequence database: and (b) most of the 
individual EST sequences in BCSGI were derived from a breast 
tumor cDNA library. After sequencing analysis of all six EST clones 
derived from the breast cancer library, one EST clone was found to 
have a complete full-length sequence. The open reading frame of the 
resulting full-length gene is predicted to encode a 127-amino acid 
polypeptide. Comparison of the predicted amino acid sequence with 
the sequence of a similar human protein is shown in Fig. 1. After 
optimal alignment, the putative BCSGI -encoded protein shows 54% 
sequence identity with the recently cloned non-A0 fragment of human 
AD amyloid protein (28). 

Tissue Expression. The expression of BCSGI gene in a variety of 
normal human tissues were analyzed by Northern blotting (Fig. 2). As 
expected, the Northern blot showed that BCSGI was abundantly 
expressed as a I -kb transcript in brain, which is the rich source for the 
AD amyloid family gene. Similar bands with much lower accumula- 
tions in their relative intensities were also obtained in ovary, testis, 
colon, and heart. By contrast, none of them was present in other 
specimens analyzed, such us breast, kidney, liver, prostate, lung, small 
intestine, thymus, and placenta. 
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Expression of BCSGI in Human Breast Cancer Cells. In an 
attempt to evaluate the potential biological significance of BCSGI on 
human breast cancer development and progression, we studied 
BCSGI gene expression in human breast cancer cells. Northern blot 
(Fig. 3) detected the 1-kb BCSGI transcript in two of four lines 
derived from pleural effusion and four of four lines detected from 
ductal infiltrating carcinomas. Among these lines, H3922 expressed 
the highest level of BCSGI mRNA. The absence of BCSGI mRNA 
in some breast cancer cell lines may suggest that the expression of 
BCSGI gene requires specific in vivo conditions, or that it is induced 
by interactions between the tumor cells and stromal cells. 

To localize the cellular source of the BCSGI expression and to further 
assess the biological relevance of the overexpresston of BCSGI in breast 
cancers, we next performed in situ hybridization on fixed breast sections 
from 20 infiltrating carcinomas, 15 DCISs, and 18 benign breast lesions, 
including 5 reduction mammoplasty specimens, 8 breast hyperplasias, 
and 5 fibroadenomas, In these experiments, we examined two aspects of 
BCSGI expression, including the tissue localization (stromal versus 
epithelial) and the correlation of BCSGI expression and breast cancer 
malignant phenotype. There was a wide variation in staining intensity for 
BCSGI expression among the human breast cancer specimens. Because 
the colorimetric in situ hybridization is not quantitative, the tissue sam- 
ples were classified into either positive or negative staining for BCSGI 
expression; no attempt was made to differentiate the levels of expression 
of BCSGI among positive-staining specimens. The negative cases were 
confirmed with at least two independent experiments. All stainings were 
reviewed by at least two people. Fig. 4 shows a representative in situ 
hybridization for BCSGI. We found a strongly positive BCSGI hybrid- 
ization in neoplastic epithelial cells of highly infiltrating breast carcino- 
mas (Fig. 4. A and B). The expression of BCSGI mRNA was detectable 
in the neoplastic epithelial cells in 17 of 20 infiltrating breast carcinomas. 
No expression of BCSGI was detected in the stromal cells. In contrast, 
expression of BCSGI was absent in 16 out of 18 cases of normal or 
benign breast lesions. A representative negative staining of BCfOl in 
normal ductal breast epithelial cells (Fig. 4£), a benign proliferative 
breast lesion (Fig. 4F), and a benign fibroadenoma (Fig. AG) are pre- 
sented. Furthermore, as demonstrated in Fig. 4B for a highly invasive 
breast carcinoma, no detectable signal of BCSGI expression was evident 
in the residual normal lobular breast epithelial cells, although the sur- 
rounding invasive breast carcinoma cells were stained positive for 
BCSGI expression. The presence of BCSGI transcript in human breast 
tissue and its overexpression in breast carcinomas are consistent with our 
differential cDN A sequencing cloning strategy, which suggests a possible 
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Rg. 3. Northern blot analysis of BCSGI expression in human breast cancer cell lines. 
Total RNA was isolated and analyzed (20 iig/Lane) by Northern blot. After hybridization 
and washing, the filter was exposed to X-ray film for 48 b. The integrity and the loading 
control of the RNAs were ascertained by direct visualization of the 1 8 S rRNA in stained 
gel. Lane I, H3396 (derived from pleural effusion); Lane 2. MCF7 {derived from pleural 
effusion); Lane J. SKBR-3 {derived from pleural effusion): Lane 4. MDA-MB-23I 
(derived from pleural effusion); Lane 5. H39I4 (derived from infiltrating ductal carcino- 
ma): Lane 6, H3922 (derived from infiltrating ductal carcinoma); Lane 7. ZR-75-1 
(derived from infiltrating ductal carcinoma); Lane 8. T47D (derived from infiltratine 
ductal carcinoma). Cell lines T47D. ZR-75-1. SKBR-3. MCF-7. and MDA-MB-231 are 
from American Type Culture Collection: alt other lines were isolated initially at Bristol- 
Myers Squibb Pharmaceutical Research Institute.* 1 
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from virtually no detectable expression in normal or benign breast to 
partial expression (7 of 15) in the in situ breast carcinoma and to the 
high expression (17 of 20) in the infiltrating malignant breast carci- 
nomas, suggest an association of BCSOl expression with breast 
cancer malignant progression. On the basis of this BCSGI expression 
pattern, we propose that BCSOl may be used potentially as a breast 
cancer progression marker. 

DISCUSSION 

More than 190,000 new cases of breast cancer are diagnosed in the 
United States every year, with incidence increasing by approximately 
1% annually (29, 30). Studies linked to the discovery of new genetic 
markers will provide new information leading to the understanding of 
breast cancer development and progression. There are two classes of 
genes affecting tumor development. Genes influencing the cancer 
phenotype that act directly as a result of changes (e.g., mutation) at the 
DNA level, such as BRCAl BRCA2, andpJJ, are called Class I genes. 
The Class 11 genes affect the phenotype by modulation at the expres- 
sion level. Development of breast cancer and subsequent malignant 
progression is associated with alterations of a variety of genes of both 
classes. Many new predictive and prognostic factors have been pro- 
posed and studied for breast cancer. HER 2/neu-positive tumors 
respond poorly to endocrine treatment (31, 32). p53 alteration has an 
indication of poorer prognosis and poor response to tamoxifen (33, 
34). The lack of Nm23 expression has an indicative value of meta- 
static potential and poor prognosis in invasive ductal carcinoma (35). 
Cathepsin D, a protease suggested to have a role in breast cancer, 
appears to affect the potential for invasive growth (11, 14, 36). 
Positive immunostaining of tumor sections with Factor VIII antibod- 
ies seems to be a marker for angiogenesis (37-39). It has been 
postulated that these tumors are targets for antiangiogenesis drug 
treatment. Expression of the mdr-l gene is proposed to be an indicator 
of multidrug resistance (38-40). Poor response to endocrine therapy 
has been indicated for urokinase-type plasminogen activator/plasmin- 
ogen activator inhibitor- 1, a plasminogen activator inhibitor (21). 
Also receiving major attention are the familial breast cancer-related 
genes BRCAI and BRCA2 (40-42). With the availability of tens of 
thousands of EST sequences, we have, using differential cDNA se- 
quence, identified a new putative breast cancer marker gene, BCSGI, 
and studied its expression in breast cancer. 

The differential cDNA sequencing method described here is a 
direct approach that utilizes an automatic EST analysis on a pair of 
cDNA libraries. Unlike previously described methods, the differential 
cDNA sequencing approach allows one to identify differentially ex- 
pressed genes or patterns of genes directly from a computer database. 
With the advancement of more efficient and rapid sequencing tech- 
nology, the direct differential cDNA sequencing approach may offer 
a powerful method for simultaneous analysis of the expression profile 
of thousands of genes, as well as for the discovery of novel genes of 
clinical interest 

Using in situ hybridization analysis, we have demonstrated the 
expression of BCSGI transcripts in the neoplastic epithelial cells of 
infiltrating breast carcinoma but not in epithelial cells of normal and 
benign breast. The overexpression (17 of 20) of BCSGI in malignant 
infiltrating breast epithelial cells compared to the partial expression (7 
of 15) in in situ carcinoma suggests that up-regulation of BCSGI 
expression is associated with breast cancer malignant progression and 
may signal the more advanced invasive/metastatic phenotype of hu- 
man breast cancer. This implication is supported Anther by the de- 
tection of BCSGI expression in six of eight aggressive Comedo-type 
DCISs and in only one of seven non-Comedo type DCISs. It is 
unlikely that BCSGI is overexpressed as a secondary effect of cellular 



proliferation, because no detectable BCSGI expression is evident in 
rapidly proliferating nonmalignant breast lesions (Fig. 4F). 

It will be interesting to investigate whether BCSGI expression in 
DCIS may indicate a malignant progression leading to invasioc and 
metastasis. There is cause for concern about the large number of DCIS 
cases that are being diagnosed as a consequence of screening mam- 
mography, most of which are treated by some form of surgery. In 
addition, the proportion of cases treated by mastectomy may be 
inappropriately high (30). DCIS by definition has intact basement 
membrane by light microscopy (43). Defective basement m^hranrs, 
however, have been found when they are stained with periodic acid- 
Schiff reagent and when they are examined by electron microscopy 
(44). In fact, it has been reported that re-evaluation by experienced 
pathologists showed that 28 and 15% of previously diagnosed DCISs 
demonstrated invasion (45, 46). If BCSGI expression can provide 
some prognostic information on distinguishing the DCIS that is not 
likely to become invasive from the DCIS that is most likely to become 
invasive, this will help to direct the treatment strategies and to reduce 
some inappropriate or unnecessary mastectomies. 

It is interesting to note that the predicted amino acid sequence of 
BCSGI gene shares a high sequence homology with the non-A0 
component of the AD amyloid precursor protein (28). A neuropatho- 
logical hallmark of AD is a widespread amyloid deposition resulting 
from 0-amyloid precursor proteins. 0- Amyloid precursor proteins are 
large, membrane-spanning proteins that either give rise to the 0-A4 
peptide (A0 fragment; Ref. 47) or a non-A0 component of AD 
amyloid (28) that is either deposited in AD amyloid plaques or 
yielding soluble forms. Although the insoluble membrane-bound AT> 
amyloid destabilizes calcium homeostasis and thus renders cell vul- 
nerable to excitotoxic conditions of calcium influx resulting from 
energy deprivation or overexcitation (48), the soluble AD amyloid 
proteins are neuroprotective against glucose deprivation and gluta- 
mate toxicity, perhaps through their ability to lower the intraneunmal 
calcium concentration (49). We currently do not know whether 
BCSGI is an instigator or a by-product during breast cancer progres- 
sion. With the availability of anti -BCSGI antibody to localize BCSGI 
protein and the recombinant BCSGI protein, we may start to speculate 
that BCSGI, like soluble AD amyloid, may be potentially involved in 
protection from tissue damage resulting from tissue remodeling due to 
the local cancer invasion. An elucidation of the reasons for BCSGI 
overexpression in infiltrating breast cancer cells may shed some light 
on the pathogenesis of breast cancer progression. Nevertheless, we 
demonstrated a stage-specific BCSGI expression and an association 
of BCSGI overexpression with clinical aggressiveness of breast can- 
cers. The notion that the BCSGI overexpression may indicate breast 
cancer malignant progression from benign breast or in situ carcinoma 
to the highly infiltrating carcinoma warrants further investigation. 
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THE THREE-DIMENSIONAL 
STRUCTURE OF PROTEINS 



n Chapter 5 we in- 
duced the concept of protein primary structure. We emphasized that this first 
eyel of organization, the amino acid sequencer is dictated by the DNA sequence 
the gene for each protein. Most proteins exhibit higher levels of structural or- 
riization. It is the specific three-dimensional structure of each protein that al- 
ows it to function in its particular biological role. 

1 Figure 6 A' depicts another representation of the three-dimensional confor- 
;tion of the myoglobin molecule we showed in Figure 5. 1. A specific 3-D struc- 
; means that every one of the thousands of atoms in a protein molecule has a 
'cular, well-defined spatial location within the molecule. This characteristic 
Emphasized in Figure 5.1. Figure 6.1, on the other hand, has been drawn to 
*1nt out that there exist two distinguishable levels of three-dimensional folding 
the polypeptide chain. First, the chain appears to be locally coiled into regions 
helical structure (labeled A-H in the figure). Such local regular folding is called 
e: secondary structure of the molecule. The helically coiled regions are in turn 
Sed into a specific compact structure for the entire polypeptide chain. We call 
^ rurther level of folding the tertiary structure of the molecule. Later in this 
~pter we shall find that some proteins consist of several polypeptide chains, 
iged in a regular manner. This arrangement we designate as the quaternary 
;el of organization. 

1 This chapter is devoted to an examination of the several levels of protein 
, cture — their geometry, how they are stabilized, and their importance in pro- 
» function. 



Protein molecules have four levels of 
structural organization: primary (se- 
quence), secondary (local folding), 
tertiary (overall folding), and quater- 
nary (multichain association). 



CONDARY STRUCTURE: REGULAR WAYS TO FOLD 
~E POLYPEPTIDE CHAIN 

E DISCOVERY OF REGULAR POLYPEPTIDE STRUCTURES 

Understanding of the protein secondary structure had its origins in the re- 
"ble work of Linus Pauling, perhaps the greatest chemist of the twentieth 
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FIGURE 6.1 



Three-dimensional folding of the protein myoglobin. 

Each amino acid is indicated by a circle corresponding to its a car- 
bon atom. Side chains are omitted, to emphasize how the poly- 
peptide backbone is wrapped into helices and folded. Individual 
a-helical regions are labeled A-H, with turn regions designated 
by two letters (e.g., GH). This protein folds about a heme group 
(shown in purple), a planar heterocyclic structure that chelates iron 
and serves as the oxygen binding site. 

O Irving Geis. 



century. As early as the 1930s, he had begun x-ray diffraction studies of ammo 
acids and small peptides, with the aim of eventually analyzing protein structure. In 
the early 1950s, Pauling and his collaborators used these data together with un- 
usual scientific intuition to begin a systematic analysis of the possible regula 
conformations of the polypeptide chain. They postulated several principles tna 
any such structure must obey: 

1. The bond lengths and bond angles should be distorted as little as possibly 
from those found through x-ray diffraction studies of amino acids an 
peptides, as shown in Figure 5.12b (page 141). 

2. No two atoms should approach one another more closely than is allow 
by their van der Waals radii. 

3 The amide group must remain planar and in the trans configuration, 
shown in Figure 5.12b. (This feature had been recognized in the earli 
x-ray diffraction studies of small peptides.) Consequently, rotation is pos 
sible only about the two bonds adjacent to the a carbon in each annn 
acid residue, as shown in Figure 6.2. 
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FIGURE 6.2 



Amide plane 



Rotation around the bonds in a polypeptide chain. Two 

adjacent amide planes are shown in teal. Rotation is allowed only 
O 

II 

about the N — C a and C a — C — bonds. The angles of rotation 
about these bonds are defined as <p and \y, respectively, with direc- 
tions defined as positive rotation as shown by the arrows. The ex- 
tended conformation of the chain shown here corresponds to <p = 
+180°,^ = +180°. 

C Irving Ceis. 



4. Some kind of noncovalent bonding is necessary to stabilize a regular fold- 
ing. The most obvious possibility is hydrogen bondingJbetween amide pro- 
tons and carbonyl oxygens: 



\ 

is 

/ 



N — H "'0=0 



Such a concept was natural to Pauling, who had had much to do with the 
development of the idea of H bonds. In summary, the preferred conforma- 
tions must be those that allow a maximum amount of hydrogen bonding, 
yet satisfy criteria 1-3. 

Working mainly with molecular models, Pauling and his associates were able 
to arrive at a small number of regular conformations that satisfied all of these 
criteria. Some were helical structures formed by a single polypeptide chain, and 
some were sheetlike structures formed by adjacent chains. The two structures 
they proposed as most likely — the a helix and P sheet — are shown in Figure 6.3a 
and b. These two structures turned out, in fact, to be the most common sec- 
ondary structures in proteins. Figure 6.4 shows two other polypeptide helices 
that have since been defined. The 3 10 helix is observed in some proteins but is 
not as common as the a helix. The n helix, though sterically possible, has not 
been observed, possibly because it has a hole down the middle too big to allow 
van der Waals interactions but too small to admit potentially stabilizing water 
molecules. All of the structures shown in Figures 6.3 and 6.4 satisfy the criteria 
listed earlier. In particular, in each structure the peptide group is planar, and 
Wry amide proton and every carbonyl oxygen (except a few near the ends of he- 
lices) is involved in hydrogen bonding. Each of these forms constitutes a possible 
kind of secondary structure in proteins. 



Of the several possible secondary 
structures for polypeptides, the 
most important are the a helix, the 
f$ sheet, and the 3 10 helix. 
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FIGURE 63 
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The a helix and p sheet. These are the two 
most important regular secondary structures 
of polypeptides. Their existence was predicted 
by Linus Pauling, (a) In the a helix the hydro- 
gen bonds are within a single polypeptide 
chain, (b) In the P sheet, the hydrogen bonds 
are between adjacent chains, of which only 
two are shown here.' 

© Irving Ceis. 





(a) a helix 



(b) p sheet 



DESCRIBING THE STRUCTURES*. MOLECULAR HELICES 
AND PLEATED SHEETS 

In Tools of Biochemistry 4B, we listed the distances that define a molecular heli> 
the crystallographic repeat (c), the pitch (p), and the rise (h). We also pointed ou 
that helices may be either right-handed or left-handed and may contain either a 
integral number of residues per turn or a nonintegral number. We call the nurn 
ber of residues per turn n and the number of residues per repeat m The law 
number must always be an integer, because it defines an exact repeat of the strut 
ture If there is an integral number of residues per turn, the pitch and the repe ; 
will be equal, and n = m. Some helices of this kind are illustrated schematically i 
Figure 6.5. Note that as the number of residues per turn increases, the structu 
changes progressively from a flat ribbon to a broad helix and eventually to 
closed ring with p = 0. Not all of these structures are found in polypeptides, i" 
example, the single-chain n = 2 structure shown in Figure 6.5 does not occur 
nature, as we will explain shortly. , 

One of Pauling's major insights was to recognize that polypeptide helices 
not have to have an integral number of residues per turn. For example, tne 



SECONDARY STRUCTURE: REGULAR WAYS TO FOLD THE POLYPEPTIDE CHAIN 



169 



FIGURE 6.4 




Other possible secondary structures of 
polypeptides, (a) The 3 10 helix is found in 
proteins but is less common than the a helix 
shown in Figure 6.3a. (b) The n (or 4.4 16 ) 
helix is sterically possible but so far has not 
been observed in proteins. 

© Irving Geis. 



(a) 3,o helix 



(b) 7T helix 



It 




n = 5 
Ring 



n = 3 



Helix (right-handed) 



Helix (right-handed) 



n = 2 
Flat ribbon 



n = -3 
Helix (left-handed) 



FIGURE 6.5 



^Idealized helices. These hypothetical struc- 
tures show the effect of varying the number - 
Jyf) of polypeptide residues per turn of a helix. 
!&each case the pitch (p) is indicated, and for 
the rise (h) is also shown. Polypeptides 
■jjn form helices ranging from a closed ring 
(>> 5, p = 0) to a 2-fold helix (n = 2), with 
gch residue rotated by 1 80° with respect to 
]. Preceding one. All the integral positive 



values of n and one example of a negative 
value are shown here. The n = 4 and n = 3 
helices are right-handed, the n = -3 helix is 
left-handed, and n = 5 (a ring) and n = 2 
(a ribbon) have no handedness. The right- 
handed a helix (not shown here), with 
n - 3.6, is intermediate between the n = 3 
and n = 4 structures. 

© Irving Gets. 
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7T helix 



FIGURE 6.6 



Hydrogen bonding patterns for four 
helices. The structures are represented in a 
diagrammatic way to simplify counting the 
atoms in each H-bonded loop. For example, 
there are 1 3 atoms in the H loop correspond- 
ing to the a (3.6, 3 ) helix. 

© Irving Ceis. 



: 6.1 Parameters of some polypeptide secondary structures 



Structure Type 



Residues/ 
Turn 



Rise (nm) 



Number of 
Atoms in 
H-Bonded Ring 



Antiparalleij3 s(ieet , 2.0 
ParaUel^ shee^V:- . , 2,0 
3 10 helix :f :i} : ; ^ • :?.0 



0.34 
0.32 
0.20 
0.15 
0.12 



10 
13 
16 



-139 
-119 
-49 
-57 
-57 



135 
113 

-26 
-47 
-70 



"Bonding is between polypeptide chains. 
fc Sterically permitted but not observed in protein. 



helix repeats after exactly 18 residues, which amounts to 5 turns. It has, therefore, 
3.6 residues per turn. Since the pitch of a helix is given by p = nh, we have for the 
a helix, with a rise of 0.15 nm/residue, p = 3.6 (res/turn) X 0.15 (nm/res) = 0.54 
nm/turn. Parameters for the other helices shown in Figures 6.3 and 6.4 are listed 
in Table 6.1. 

The parameters defined above describe most molecular helices. For polypep- 
tide helices, which involve hydrogen bonding, there is an additional important 
quantity. If you examine the model for the a helix (Figure 6.3a), you will note 
that each carbonyl oxygen is hydrogen-bonded to the amido proton on the fourth 
residue up the helix. Thus, if we include the hydrogen bond, a loop of 13 atoms is 
formed, as shown in Figure 6.6. Each of the helices shown in Figures 6.3 and 6.4 
has a different number of atoms in such a hydrogen-bonded loop. We shall call 
this number N". A quick way to describe a polypeptide helix, then, is by the short- 
hand n Nl where n is the number of residues per turn. The 3 10 helix fits this de- 
scription; it has exactly 3.0 residues per turn and a 10-member loop. The a hela 
could also be called a 3.6 13 helix, and the n helix a 4.4 16 helix. 

Because hydrogen bonds tend to be linear, the atoms — N — H ■ • • O = 111 
polypeptide helices should lie on a straight line. If you examine Figures 6.3 and 6.4 
you will see that this requirement is at least approximately satisfied for the 3 10 > a 
and n helices. However, it is very difficult to make helices with only two reside 
per turn and linear hydrogen bonds between residues in the same chain. Therefore 
the only n = 2 structure that is found in proteins is not the flat ribbon shown 
Figure 6.5 but the ft pleated sheet structure shown in Figure 6.3b. In the P pleatec 
sheet, each residue is rotated by 180° with respect to the preceding one, whia 
makes each chain an n = 2 helix. If the chains are also folded in an accordion-l^ 

* 

4 
i 
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(a) Parallel p sheet 



..FIGURE 6.7 



(b) Antiparallel p sheet 



Two kinds of $ sheet. The arrows point in the N ► C direc- 
tion in each chain. Gray = carbon, purple = nitrogen, red = oxy- 
gen, blue = hydrogen, white = side chain. 

ishion, linear hydrogen bonds can occur between adjacent chains. Forming in- 
erchain bonds allows correct bond angles with minimal strain when n - 2. Fig- 
re 6.7 shows the two ways in which this can be done. The chains can have their 
, . . * C directions running parallel, to make the parallel ft sheet, or they can be 
r tiparallel It is instructive to try to form such structures using molecular mod- 
. Is. You will find that making an n = 2 helix with internal H bonds is awkward, 
ut the pleated-sheet structures form naturally. 
j ; Thus, possible secondary protein structures fall into two general classes: vari- 
ous helices and at least two types of pleated sheet. But just because a structure can 
drawn and contains good H bonds does not mean that it necessarily exists. Many 
kinds of chain conformations can be imagined that are sterically impossible be- 
cause atoms in the backbone and/or side chains would overlap. These steric re- 
strictions can be fully appreciated only by examining space-filling models, as 
'Shown in Figure 6.8. The a helix, for example, is fully packed. To examine steric 
^crowding in a systematic way, we need a general procedure for describing poly- 
peptide conformations. 

vRamachandran plots 

Was shown in Figure 6.2, each residue in a polypeptide chain has two back- 
, ne bonds about which rotation is permitted — the bond between the nitrogen 
A the a carbon, and the bond between the a carbon and the carbonyi oxygen. 



FIGURE 6.8 



Segment of an a helix shown as a space- 
filling model. The segment illustrated is 
from the E helix in sperm whale myoglobin 
(see Figure 6.1). 

Courtesy of Richard J. FekJman, National Institutes of Health. 



